Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

نویسندگان

  • Mehdi Naseriparsa
  • Mohammad Mansour Riahi Kashani
چکیده

Classification algorithms are unable to make reliable models on the datasets with huge sizes. These datasets contain many irrelevant and redundant features that mislead the classifiers. Furthermore, many huge datasets have imbalanced class distribution which leads to bias over majority class in the classification process. In this paper combination of unsupervised dimensionality reduction methods with resampling is proposed and the results are tested on LungCancer dataset. In the first step PCA is applied on LungCancer dataset to compact the dataset and eliminate irrelevant features and in the second step SMOTE resampling is carried out to balance the class distribution and increase the variety of sample domain. Finally, Naïve Bayes classifier is applied on the resulting dataset and the results are compared and evaluation metrics are calculated. The experiments show the effectiveness of the proposed method across four evaluation metrics: Overall accuracy, False Positive Rate, Precision, Recall.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Survival of Patients with Lung Cancer Using Improved Adaptive Neuro-Fuzzy Inference System

Introduction: Lung cancer is the main cause of mortality in both genders worldwide. This disease is caused by the uncontrollable growth and development of cells in both or one of the lungs. Although the early diagnosis of this cancer is not an easy task, the earlier it is diagnosed, the higher will be the chance of treating. The objective of this study was to develop an optimized prediction mod...

متن کامل

Predicting Survival of Patients with Lung Cancer Using Improved Adaptive Neuro-Fuzzy Inference System

Introduction: Lung cancer is the main cause of mortality in both genders worldwide. This disease is caused by the uncontrollable growth and development of cells in both or one of the lungs. Although the early diagnosis of this cancer is not an easy task, the earlier it is diagnosed, the higher will be the chance of treating. The objective of this study was to develop an optimized prediction mod...

متن کامل

Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem

Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...

متن کامل

ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION

With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...

متن کامل

Computer-Aided Lung Nodule Recognition by SVM Classifier Based on Combination of Random Undersampling and SMOTE

In lung cancer computer-aided detection/diagnosis (CAD) systems, classification of regions of interest (ROI) is often used to detect/diagnose lung nodule accurately. However, problems of unbalanced datasets often have detrimental effects on the performance of classification. In this paper, both minority and majority classes are resampled to increase the generalization ability. We propose a nove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1403.1949  شماره 

صفحات  -

تاریخ انتشار 2013